Next-Day Bitcoin Price Forecast

Author

Nihad & Vikram

Published

June 9, 2024

Required libraries

library(xts)
library(quantmod)
library(ggthemes)
library(dygraphs)
library(tidyverse)
library(urca)
library(tseries)
library(forecast)
library(dplyr)

Read bitcoin csv daily price

quotes_bitcoin <- read_csv("../data/Bitcoindata.csv", 
                           col_select = c(Date,Close))

We can examine structure of the resulting object:

head(quotes_bitcoin)
# A tibble: 6 × 2
  timeOpen            close
  <dttm>              <dbl>
1 2018-10-03 00:00:00 6503.
2 2018-10-02 00:00:00 6556.
3 2018-10-01 00:00:00 6590.
4 2018-09-30 00:00:00 6626.
5 2018-09-29 00:00:00 6602.
6 2018-09-28 00:00:00 6644.
tail(quotes_bitcoin)
# A tibble: 6 × 2
  timeOpen            close
  <dttm>              <dbl>
1 2012-01-06 00:00:00  6.60
2 2012-01-05 00:00:00  6.67
3 2012-01-04 00:00:00  5.55
4 2012-01-03 00:00:00  4.90
5 2012-01-02 00:00:00  5.22
6 2012-01-01 00:00:00  5.13
glimpse(quotes_bitcoin)
Rows: 2,468
Columns: 2
$ timeOpen <dttm> 2018-10-03, 2018-10-02, 2018-10-01, 2018-09-30, 2018-09-29, …
$ close    <dbl> 6502.59, 6556.10, 6589.62, 6625.56, 6601.96, 6644.13, 6676.75…

Let’s also check the class of the Date column:

class(quotes_bitcoin$Close)
[1] "numeric"

lets check structure of the whole dataset

str(quotes_bitcoin)
tibble [2,468 × 2] (S3: tbl_df/tbl/data.frame)
 $ timeOpen: POSIXct[1:2468], format: "2018-10-03" "2018-10-02" ...
 $ close   : num [1:2468] 6503 6556 6590 6626 6602 ...
 - attr(*, "spec")=
  .. cols(
  ..   timeOpen = col_datetime(format = ""),
  ..   timeClose = col_skip(),
  ..   timeHigh = col_skip(),
  ..   timeLow = col_skip(),
  ..   name = col_skip(),
  ..   open = col_skip(),
  ..   high = col_skip(),
  ..   low = col_skip(),
  ..   close = col_double(),
  ..   volume = col_skip(),
  ..   marketCap = col_skip(),
  ..   timestamp = col_skip()
  .. )

##Let’s transform column ‘Date’ into type date:

quotes_bitcoin$Date <- as.Date(quotes_bitcoin$Date, format = "%d/%m/%Y")

We have to give the format in which date is originally stored: * %y means 2-digit year, * %Y means 4-digit year * %m means a month * %d means a day

class(quotes_bitcoin$Date)
[1] "Date"
head(quotes_bitcoin)
# A tibble: 6 × 2
  Date       Close
  <date>     <dbl>
1 2018-10-04 6548.
2 2018-10-03 6457.
3 2018-10-02 6500 
4 2018-10-01 6571.
5 2018-09-30 6598.
6 2018-09-29 6579.
glimpse(quotes_bitcoin)
Rows: 2,466
Columns: 2
$ Date  <date> 2018-10-04, 2018-10-03, 2018-10-02, 2018-10-01, 2018-09-30, 201…
$ Close <dbl> 6547.56, 6456.77, 6500.00, 6571.20, 6597.81, 6579.38, 6610.76, 6…

Now R understands this column as dates

Creating xts objects

quotes_bitcoin <- 
  xts(quotes_bitcoin[, -1], # data columns (without the first column with date)
      quotes_bitcoin$Date)  # date/time index

Lets see the result:

head(quotes_bitcoin)
              close
2012-01-01 5.132450
2012-01-02 5.218210
2012-01-03 4.898447
2012-01-04 5.546638
2012-01-05 6.671950
2012-01-06 6.602055
str(quotes_bitcoin)
An xts object on 2012-01-01 / 2018-10-03 containing: 
  Data:    double [2468, 1]
  Columns: close
  Index:   Date [2468] (TZ: "UTC")

Finally, let’s use the ggplot2 package to produce nice visualization.

The ggplot2 package expects data to be in long format, rather than wide format.

Hence, first we have to convert the tibble to a long tibble:

Plotting Actual Bitcoin Price

tibble(df = quotes_bitcoin) %>%
  ggplot(aes(zoo::index(quotes_bitcoin), df)) +
  geom_line() +
  theme_bw() +
  scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Actual Bitcoin Price",
    subtitle = paste0("Number of observations: ", length(quotes_bitcoin)),
    caption = "source: RR 2024",
    x="",
    y=""
  )

Plotting Log Transformed Bitcoin Price

tibble(df = quotes_bitcoin) %>%
  ggplot(aes(zoo::index(quotes_bitcoin), log(quotes_bitcoin))) +
  geom_line() +
  theme_bw() +
  scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "Log Transformed Bitcoin Price",
    subtitle = paste0("Number of observations: ", length(quotes_bitcoin)),
    caption = "source: RR 2024",
    x="",
    y=""
  )

Plotting 1st Difference Log Operator

tibble(df = quotes_bitcoin) %>%
  ggplot(aes(zoo::index(quotes_bitcoin), periodReturn(quotes_bitcoin, period="daily", type="log"))) +
  geom_line() +
  theme_bw() +
  scale_x_date(date_breaks = "1 year", date_labels = "%b-%Y")+
  labs(
    title = "1st Difference Log Operator",
    subtitle = paste0("Number of observations: ", length(quotes_bitcoin)),
    caption = "source: RR 2024",
    x="",
    y=""
  )

#Table 1. Stationary test of data.

First in-sample window (500 days)

Data Training_Sample ADF_Test PP_Test
Original data 01/01/2012~14/05/2013 -1.849 ( 0.642 ) -12.235 ( 0.427 )
Log transformed data 01/01/2012~14/05/2013 -1.521 ( 0.781 ) -3.828 ( 0.896 )
1st difference log operator 01/01/2012~14/05/2013 -9.743 ( 0.010 ) -497.980 ( 0.010 )

Second in-sample window (2000 days)

Data Training_Sample ADF_Test PP_Test
Original data 01/01/2012~25/06/2017 0.617 ( 0.990 ) 5.162 ( 0.990 )
Log transformed data 01/01/2012~25/06/2017 -1.378 ( 0.842 ) -3.367 ( 0.918 )
1st difference log operator 01/01/2012~25/06/2017 -11.478 ( 0.010 ) -2103.646 ( 0.010 )

ADF. Augmented Dicky-Fuller test; PP. Phillips-Perron test. p-values in parenthesis, p-value less than 0.05 confirms stationary

#Table 2. Training-sample forecast performance.

First training-sample window (500 days)

Forecast_Model Training_Sample RMSE MAPE MAE
ARIMA (4,1,0) 01/01/2012~14/05/2013 0.063 1.317 0.033
NNAR (2,1) 01/01/2012~14/05/2013 0.058 1.264 0.032

Second training-sample window (2000 days)

Forecast_Model Training_Sample RMSE MAPE MAE
ARIMA (4,1,1) 01/01/2012~25/06/2017 0.048 0.645 0.027
NNAR (1,2) 01/01/2012~25/06/2017 0.048 0.641 0.027

#(a) Actual and forecasted Bitcoin price (training sample:500 days, test-sample:1966 days)

#(b) Concentrated view on the forecast period (test-sample:1966 days)

#(c) Actual and forecasted Bitcoin price (training sample:2000 days, test-sample:466 days)

#(d) Concentrated view on the forecast period (test-sample:466 days)

Table 3. Test-sample static forecast performance.

First test-sample window (1966 days) Forecast without re-estimation at each step

Forecast_Model Training_Sample RMSE MAPE MAE
ARIMA (4,1,0) 15/05/2013~04/10/2018 0.373 2.924 0.230
NNAR (2,1) 15/05/2013~04/10/2018 0.042 0.357 0.024

Forecast with re-estimation at each step

Forecast_Model Training_Sample RMSE MAPE MAE
ARIMA 15/05/2013~04/10/2018 0.312 2.668 0.205
NNAR 15/05/2013~04/10/2018 0.050 0.425 0.029

Second test-sample window (466 days) Forecast without re-estimation at each step

Forecast_Model Training_Sample RMSE MAPE MAE
ARIMA (4,1,1) 26/06/2017~04/10/2018 0.026 0.098 0.009
NNAR (1,2) 26/06/2017~04/10/2018 0.022 0.078 0.007

Forecast with re-estimation at each step

Forecast_Model Training_Sample RMSE MAPE MAE
ARIMA (4,1,1) 26/06/2017~04/10/2018 0.026 0.097 0.009
NNAR (1,2) 26/06/2017~04/10/2018 0.031 0.106 0.009

Table 4. DM test of forecast results.

#First test-sample window (1966 days)

Models_Compared DM_Statistics p_Value
DM ARIMA vs. NNAR (re-estimation) -37.724 3.062208e-246
DM1 ARIMA vs. NNAR (without re-estimation) -34.225 2.223566e-210
DM2 ARIMA (re-estimation) vs. ARIMA (without re-estimation) 18.317 2.281731e-70
DM3 NNAR (re-estimation) vs. NNAR (without re-estimation) -18.115 5.935986e-69

#Second test-sample window (466 days)

Models_Compared DM_Statistics p_Value
DM ARIMA vs. NNAR (re-estimation) 1.036 3.004223e-01
DM1 ARIMA vs. NNAR (without re-estimation) -19.023 2.136618e-75
DM2 ARIMA (re-estimation) vs. ARIMA (without re-estimation) 6.177 7.611747e-10
DM3 NNAR (re-estimation) vs. NNAR (without re-estimation) -13.003 1.943571e-37

p < 0.05 indicates that forecast results of the first method is better than the second method.

#Ljung-Box testing for used ARIMA models


    Box-Pierce test

data:  et410
X-squared = 27.863, df = 4, p-value = 0.00001329


    Box-Pierce test

data:  et411
X-squared = 27.005, df = 3, p-value = 0.000005873

#Proposed improved solution for ARIMA models (6,1,1) for 500 training data set


    Box-Pierce test

data:  et611
X-squared = 5.5026, df = 3, p-value = 0.1385

                      ME       RMSE        MAE       MPE     MAPE     MASE
Training set 0.006948153 0.06167435 0.03395994 0.2132751 1.364582 1.020184
                    ACF1
Training set -0.02906356

#Proposed improved solution for ARIMA models (6,1,1) for 500 training data set


    Box-Pierce test

data:  et510
X-squared = 3.942, df = 2, p-value = 0.1393

                      ME       RMSE        MAE        MPE      MAPE      MASE
Training set 0.002875649 0.04777167 0.02727576 0.06537294 0.6468885 0.9993979
                     ACF1
Training set -0.007808208